{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting started with Transformers and TPU using PyTorch\n",
    "\n",
    "\n",
    "Tensor Processing Units (TPU) are specialized accelerators developed by Google to speed up machine learning tasks. They are built from the ground up with a focus on machine & deep learning workloads. \n",
    "\n",
    "TPUs are available on the [Google Cloud](https://cloud.google.com/tpu/docs/tpus) and can be used with popular deep learning frameworks, including [TensorFlow](https://www.tensorflow.org/), [JAX](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html), and [PyTorch](https://pytorch.org/get-started/locally/).\n",
    "\n",
    "This blog post will cover how to get started with Hugging Face Transformers and TPUs using PyTorch and [accelerate](https://huggingface.co/docs/accelerate/index). You will learn how to fine-tune a BERT model for Text Classification using the newest Google Cloud TPUs. \n",
    "\n",
    "You will learn how to:\n",
    "\n",
    "1. Launch TPU VM on Google Cloud\n",
    "2. Setup Jupyter environment & install Transformers\n",
    "3. Load and prepare the dataset\n",
    "4. Fine-tune BERT on the TPU with the Hugging Face `accelerate` \n",
    "\n",
    "Before we can start, make sure you have a **[Hugging Face Account](https://huggingface.co/join)** to save artifacts and experiments.\n",
    "\n",
    "## 1. Launch TPU VM on Google Cloud\n",
    "\n",
    "The first step is to create a TPU development environment. We are going to use the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) `gcloud` to create a cloud TPU VM using PyTorch 1.13 image. \n",
    "\n",
    "If you don’t have the `cloud` installed check out the [documentation](https://cloud.google.com/sdk/docs/install) or run the command below. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "curl https://sdk.cloud.google.com | bash\n",
    "exec zsh -l\n",
    "gcloud init"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now create our cloud TPU VM with our preferred region, project and version. \n",
    "\n",
    "\n",
    "_Note: Make sure to have the [Cloud TPU API](https://console.cloud.google.com/compute/tpus/) enabled to create your Cloud TPU VM_\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gcloud compute tpus tpu-vm create bert-example \\\n",
    "--zone=europe-west4-a \\\n",
    "--accelerator-type=v3-8 \\\n",
    "--version=tpu-vm-pt-1.13"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Setup Jupyter environment & install Transformers\n",
    "\n",
    "Our cloud TPU VM is now running, and we can ssh into it, but who likes to develop inside a terminal? We want to set up a **`Jupyter`** environment, which we can access through our local browser. For this, we need to add a port for forwarding in the `gcloud` ssh command, which will tunnel our localhost traffic to the cloud TPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gcloud compute tpus tpu-vm ssh bert-example \\\n",
    "--zone europe-west4-a \\\n",
    "-- -L 8080:localhost:8080 "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can access our environment, we need to install `jupyter` and the Hugging Face Libraries, including `transformers` and `datasets`. Running the following command will install all the required packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pip3 install jupyter transformers datasets evaluate accelerate tensorboard scikit-learn  --upgrade\n",
    "# install specific markupsafe version to not break\n",
    "pip3 markupsafe==2.0.1"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now start our `jupyter` server. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "python3 -m notebook --allow-root --port=8080"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see a familiar `jupyter` output with a URL to the notebook.\n",
    "\n",
    "`http://localhost:8080/?token=8c1739aff1755bd7958c4cfccc8d08cb5da5234f61f129a9`\n",
    "\n",
    "We can click on it, and a `jupyter` environment opens in our local browser.\n",
    "\n",
    "We can now create a new notebook and test to see if we have access to the TPUs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os \n",
    "\n",
    "# make the TPU available accelerator to torch-xla\n",
    "os.environ[\"XRT_TPU_CONFIG\"]=\"localservice;0;localhost:51011\"\n",
    "\n",
    "import torch\n",
    "import torch_xla.core.xla_model as xm\n",
    "\n",
    "device = xm.xla_device()\n",
    "t1 = torch.randn(3,3,device=device)\n",
    "t2 = torch.randn(3,3,device=device)\n",
    "print(t1 + t2)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Awesome! 🎉 We can use our TPU with PyTorch. Let's get to our example. \n",
    "\n",
    "**NOTE: make sure to restart your notebook to not longer allocate a TPU with the tensor we created!**\n",
    "\n",
    "## 3. Load and prepare the dataset\n",
    "\n",
    "We are training a Text Classification model on the [BANKING77](https://huggingface.co/datasets/banking77) dataset to keep the example straightforward. The BANKING77 dataset provides a fine-grained set of intents (classes) in a banking/finance domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection.\n",
    "\n",
    "This is the same dataset we used for the [“Getting started with Pytorch 2.0 and Hugging Face Transformers”](https://www.philschmid.de/getting-started-pytorch-2-0-transformers), which will help us to compare the performance later. \n",
    "\n",
    "We will use the `load_dataset()` method from the [🤗 Datasets](https://huggingface.co/docs/datasets/index) library to load the `banking77`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "# Dataset id from huggingface.co/dataset\n",
    "dataset_id = \"banking77\"\n",
    "\n",
    "# Load raw dataset\n",
    "raw_dataset = load_dataset(dataset_id)\n",
    "\n",
    "print(f\"Train dataset size: {len(raw_dataset['train'])}\")\n",
    "print(f\"Test dataset size: {len(raw_dataset['test'])}\")\n",
    "\n",
    "# Train dataset size: 10003\n",
    "# Test dataset size: 3080"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s check out an example of the dataset.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import randrange\n",
    "\n",
    "random_id = randrange(len(raw_dataset['train']))\n",
    "raw_dataset['train'][random_id]\n",
    "# {'text': 'How can I change my PIN without going to the bank?', 'label': 21}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To train our model, we need to convert our \"Natural Language\" to token IDs. This is done by a Tokenizer, which tokenizes the inputs (including converting the tokens to their corresponding IDs in the pre-trained vocabulary) if you want to learn more about this, out [chapter 6](https://huggingface.co/course/chapter6/1?fw=pt) of the [Hugging Face Course](https://huggingface.co/course/chapter1/1).\n",
    "\n",
    "Since TPUs expect a fixed shape of inputs, we need to make sure to truncate or pad all samples to the same length. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "# Model id to load the tokenizer\n",
    "model_id = \"bert-base-uncased\"\n",
    "# Load Tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    "\n",
    "# Tokenize helper function\n",
    "def tokenize(batch):\n",
    "    return tokenizer(batch['text'], padding='max_length', truncation=True,return_tensors=\"pt\")\n",
    "\n",
    "# Tokenize dataset\n",
    "raw_dataset =  raw_dataset.rename_column(\"label\", \"labels\") # to match Trainer\n",
    "tokenized_dataset = raw_dataset.map(tokenize, batched=True, remove_columns=[\"text\"])\n",
    "tokenized_dataset = tokenized_dataset.with_format(\"torch\")\n",
    "\n",
    "print(tokenized_dataset[\"train\"].features.keys())\n",
    "# dict_keys(['input_ids', 'token_type_ids', 'attention_mask','lable'])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are using Hugging Face [accelerate](https://huggingface.co/docs/accelerate/index) to train our model in this example. [Accelerate](https://huggingface.co/docs/accelerate/index) is a library to easily write PyTorch training loops for agnostic Hardware setups, which makes it super easy to write TPU training methods without the need to know any XLA features. \n",
    "\n",
    "## 4. Fine-tune BERT on the TPU with the Hugging Face `accelerate`\n",
    "\n",
    "[Accelerate](https://huggingface.co/docs/accelerate/index) is enables PyTorch users run PyTorch training across any distributed configuration by adding just four lines of code! Built on `torch_xla` and `torch.distributed`, 🤗 Accelerate takes care of the heavy lifting, so you don’t have to write any custom code to adapt to these platforms.\n",
    "\n",
    "Accelerate implements a [notebook launcher](https://huggingface.co/docs/accelerate/basic_tutorials/notebook), which allows you to easily start your training jobs from a notebook cell rather than needing to use `torchrun` or other launcher, which makes experimenting so much easier, since we can write all the code in the notebook rather than the need to create long and complex python scripts. We are going to use the `notebook_launcher` which will allow us to skip the `accelerate config` command, since we define our environment inside the notebook. \n",
    "\n",
    "The two most important things to remember for training on TPUs is that the `accelerator` object has to be defined inside the `training_function`, and your model should be created outside the training function.\n",
    "\n",
    "We will load our model with the `AutoModelForSequenceClassification` class from the [Hugging Face Hub](https://huggingface.co/bert-base-uncased). This will initialize the pre-trained BERT weights with a classification head on top. Here we pass the number of classes (77) from our dataset and the label names to have readable outputs for inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForSequenceClassification\n",
    "\n",
    "# Model id to load the tokenizer\n",
    "model_id = \"bert-base-uncased\"\n",
    "\n",
    "# Prepare model labels - useful for inference\n",
    "labels = tokenized_dataset[\"train\"].features[\"labels\"].names\n",
    "num_labels = len(labels)\n",
    "label2id, id2label = dict(), dict()\n",
    "for i, label in enumerate(labels):\n",
    "    label2id[label] = str(i)\n",
    "    id2label[str(i)] = label\n",
    "\n",
    "# Download the model from huggingface.co/models\n",
    "model = AutoModelForSequenceClassification.from_pretrained(\n",
    "    model_id, num_labels=num_labels, label2id=label2id, id2label=id2label\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We evaluate our model during training. We use the `evaluate` library to calculate the [f1 metric](https://huggingface.co/spaces/evaluate-metric/f1) during training on our test split. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import evaluate\n",
    "import numpy as np\n",
    "\n",
    "# Metric Id\n",
    "metric = evaluate.load(\"f1\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now write our `train_function`. If you want to learn more about how to adjust a basic PyTorch training loop to `accelerate` you can take a look at the [Migrating your code to 🤗 Accelerate guide](https://huggingface.co/docs/accelerate/basic_tutorials/migration)**.**\n",
    "\n",
    "We are using a magic cell `%%writefile` to write the `train_function` to an external `train.py` module to properly use it in `ipython`. The `train.py` module also includes a `create_dataloaders` method, which will be used to create our `DataLoaders` for training using the tokenized dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile train.py\n",
    "\n",
    "from datasets import load_dataset, load_metric\n",
    "from accelerate import Accelerator\n",
    "from transformers import (\n",
    "    AdamW,\n",
    "    get_linear_schedule_with_warmup\n",
    ")\n",
    "from tqdm.auto import tqdm\n",
    "import datasets\n",
    "import transformers\n",
    "import torch\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "def create_dataloaders(tokenized_dataset, train_batch_size=8, eval_batch_size=32):\n",
    "    train_dataloader = DataLoader(\n",
    "        tokenized_dataset[\"train\"], shuffle=True, batch_size=train_batch_size\n",
    "    )\n",
    "    eval_dataloader = DataLoader(\n",
    "        tokenized_dataset[\"test\"], shuffle=False, batch_size=eval_batch_size\n",
    "    )\n",
    "    return train_dataloader, eval_dataloader\n",
    "\n",
    "def training_function(model,hyperparameters,metric,tokenized_dataset):\n",
    "    # Initialize accelerator with bf16\n",
    "    accelerator = Accelerator()# mixed_precision=\"bf16\")\n",
    "\n",
    "    # To have only one message (and not 8) per logs of Transformers or Datasets, we set the logging verbosity\n",
    "    if accelerator.is_main_process:\n",
    "        datasets.utils.logging.set_verbosity_warning()\n",
    "        transformers.utils.logging.set_verbosity_info()\n",
    "    else:\n",
    "        datasets.utils.logging.set_verbosity_error()\n",
    "        transformers.utils.logging.set_verbosity_error()\n",
    "\n",
    "    train_dataloader, eval_dataloader = create_dataloaders(\n",
    "        tokenized_dataset,train_batch_size=hyperparameters[\"per_tpu_train_batch_size\"], eval_batch_size=hyperparameters[\"per_tpu_eval_batch_size\"]\n",
    "    )\n",
    "\n",
    "    # Instantiate optimizer\n",
    "    optimizer = AdamW(params=model.parameters(), lr=hyperparameters[\"learning_rate\"])\n",
    "\n",
    "    # Prepare everything\n",
    "    model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(\n",
    "        model, optimizer, train_dataloader, eval_dataloader\n",
    "    )\n",
    "\n",
    "    num_epochs = hyperparameters[\"num_epochs\"]\n",
    "    # Instantiate learning rate scheduler\n",
    "    lr_scheduler = get_linear_schedule_with_warmup(\n",
    "        optimizer=optimizer,\n",
    "        num_warmup_steps=100,\n",
    "        num_training_steps=len(train_dataloader) * num_epochs,\n",
    "    )\n",
    "\n",
    "    # Add a progress bar to keep track of training.\n",
    "    progress_bar = tqdm(range(num_epochs * len(train_dataloader)), disable=not accelerator.is_main_process)\n",
    "    # Now we train the model\n",
    "    for epoch in range(num_epochs):\n",
    "        model.train()\n",
    "        for step, batch in enumerate(train_dataloader):\n",
    "            outputs = model(**batch)\n",
    "            loss = outputs.loss\n",
    "            accelerator.backward(loss)\n",
    "            \n",
    "            optimizer.step()\n",
    "            lr_scheduler.step()\n",
    "            optimizer.zero_grad()\n",
    "            \n",
    "            progress_bar.update(1)\n",
    "            \n",
    "\n",
    "        # run evaluation after the training epoch\n",
    "        model.eval()\n",
    "        all_predictions = []\n",
    "        all_labels = []\n",
    "\n",
    "        for step, batch in enumerate(eval_dataloader):\n",
    "            with torch.no_grad():\n",
    "                outputs = model(**batch)\n",
    "            predictions = outputs.logits.argmax(dim=-1)\n",
    "\n",
    "            # We gather predictions and labels from the 8 TPUs to have them all.\n",
    "            all_predictions.append(accelerator.gather(predictions))\n",
    "            all_labels.append(accelerator.gather(batch[\"labels\"]))\n",
    "\n",
    "        # Concatenate all predictions and labels.\n",
    "        all_predictions = torch.cat(all_predictions)[:len(tokenized_dataset[\"test\"])]\n",
    "        all_labels = torch.cat(all_labels)[:len(tokenized_dataset[\"test\"])]\n",
    "\n",
    "        eval_metric = metric.compute(predictions=all_predictions, references=all_labels, average=\"weighted\")\n",
    "        accelerator.print(f\"epoch {epoch}:\", eval_metric)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last step is to define the `hyperparameters` we use for our training.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hyperparameters = {\n",
    "    \"learning_rate\": 3e-4,\n",
    "    \"num_epochs\": 3,\n",
    "    \"per_tpu_train_batch_size\": 32, # Actual batch size will this x 8\n",
    "    \"per_tpu_eval_batch_size\": 8, # Actual batch size will this x 8\n",
    "}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we're ready for launch! It's super easy with the `notebook_launcher` from the Accelerate library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from train import training_function\n",
    "from accelerate import notebook_launcher\n",
    "import os\n",
    "\n",
    "# set environment variable to spawn xmp \n",
    "# https://github.com/huggingface/accelerate/issues/967\n",
    "os.environ[\"KAGGLE_TPU\"] = \"yes\" # adding a fake env to launch on TPUs\n",
    "os.environ[\"TPU_NAME\"] = \"dummy\"\n",
    "# make the TPU available accelerator to torch-xla\n",
    "os.environ[\"XRT_TPU_CONFIG\"]=\"localservice;0;localhost:51011\"\n",
    "\n",
    "# args\n",
    "args = (model, hyperparameters, metric, tokenized_dataset)\n",
    "\n",
    "# launch training\n",
    "notebook_launcher(training_function, args)\n",
    "\n",
    "# epoch 0: {'f1': 0.28473517320655745}\n",
    "# epoch 1: {'f1': 0.814198544360063}\n",
    "# epoch 2: {'f1': 0.915311713296595}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note: You may notice that training seems exceptionally slow at first. This is because TPUs first run through a few batches of data to see how much memory to allocate before utilizing this configured memory allocation extremely efficiently.*\n",
    "\n",
    "We are using 8x `v3` TPUs with a global batch size of `256`, achieving `481 train_samples_per_second`\n",
    "\n",
    "The training with compilation and evaluation took `220` seconds and achieved an **`f1`** score of **`0.915`**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "hf",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:25:50) \n[Clang 11.1.0 ]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "5fcf248a74081676ead7e77f54b2c239ba2921b952f7cbcdbbe5427323165924"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
